LPAS: High Efficiency Load Balancing Parallel Data Mining Algorithm
نویسندگان
چکیده
Association rule discovery plays an important role in knowledge discovery and data mining, and efficiency is especially crucial for an algorithm finding frequent itemsets from a large database. Many methods have been proposed to solve this problem. In addition, parallel computing has been a popular trend, such as on cloud platform, grid system or multicore platform. In this paper, a high efficiency load balancing parallel data mining method based on Apriori with sorting algorithm so called the Load balancing Parallel mining method based on Apriori with Sorting (LPAS) is proposed. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. Furthermore, this algorithm is performed better than previous methods. The experimental results showed that this method had dramatically reduced computation time with more threads. Moreover, it was observed that the workload was equally dispatched to each computing unit. Keywordsparallel data mining; apriori; load balancing; association rules
منابع مشابه
Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملA New Load Balancing Approach for Parallel FP-Growth
Due to the exponential growth in worldwide information, companies have to deal with an ever growing amount of digital information. So the huge size of data and computation volume of new processing applications such as data mining, leads to new high performance parallel processing systems. One of the most important challenges of such application is quickly and correctly finding the relationship ...
متن کاملApplication of Parallelized Apriori in Grid Computing Environment
The goal of the strategy is to improve the performance of distributed algorithms and better their responsiveness. The association rule mining algorithms has high computational complexity due to the size of its search space and the high demands of data access. The work aims at mining the data in a grid computing environment, which computes by distributing the data to its clusters and mines it in...
متن کاملA Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
In today’s world, large volumes of data are being continuously generated by many scientific applications, such as bioinformatics or networking. Since each monitored event is usually characterized by a variety of features, highdimensional datasets have been continuously generated. To extract value from these complex collections of data, different exploratory data mining algorithms can be used to...
متن کاملParallel Performance of Adaptive Algorithms with Dynamic Load Balancing
Parallelization of adaptive algorithms leads to problems with parallel efficiency. Adaptation is a method which introduces dynamic perturbations to computational environment. This in turn causes problems with proper load balance. To ensure proper efficiency of a parallel simulation it is necessary to perform load balancing whenever certain threshold of load balance is breached. In this paper au...
متن کامل